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Abstract 

This paper extends the algorithm schemes proposed in [11] and |12] to the minimization of the 
sum of a composite objective function and a convex function. Two proximal point-type schemes 
are provided and their global convergence is investigated. The worst case complexity bound 
is also estimated under certain Lipschitz conditions and nondegeneratedness. The algorithm is 
then accelerated to get a faster convergence rate for the strongly convex case. 
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1 Introduction 

In this paper we consider the following minimization problem: 

min {f(x) := g(x) + <j>{F{x)) \ x G 17} , (P) 

where 51 is a nonempty, closed convex subset in R", g : R™ — >• R, F : R" — > R m is continuous 
differentiable on an open set J- of R™ and <f> : R m — > R is a proper, lower semicontinuous and 
convex. 

Problem JF]) covers many practical problems in optimization, signal processing and statistics. 
For instance, when F reduces to the identity mapping, problem (jP|) collapes to an optimization 
problem on a convex set. If we take g = and </>(■) = || • H2 then (|P]) reduces to a classical least 
squares problem. Moreover, when <?(•) = || ■ ||* with a given norm (e.g., Hubber norm, /i-norm) 
this problem becomes a sparse least squares problem. Sparse least squares problems often appear 
in signal processing and statistics (see, e.g. [3J [501 [H] ) . Another example of JF]) is the ^-penalized 
problem of a nonlinear program, which is represented as 

mm{p(x)+c(\\q(x)\\ 1 + [r(x)]+), x e SI} , (1.1) 

where p is the objective function, q(x) = is the equality constraints, r(x) < is then inequality 
constraints of the original problem, c > is a penalty parameter, Q characterizes for the simple 
constraints which is assumed to be convex and [z] + := Y^j=i max{0, Zj}. If we define g := p, 
(f>(u,v) := c||ii||i+[t?]-|_ and F(x) := (a T . r T ) T then the li-penalized problem (|1.1[) can be reformulated 
equivalently to ([P]). In some cases, we want to compute the minimum norm solution of a nonlinear 
systems F(x) = 0, then this problem can be reformulated as 

min p\\x\\l + \\F(x) II, (1.2) 

X 

where p > is a given parameter. This problem is indeed a particular form of (fPl) . 

The last example in our interest is the problem resulting from nonlinear programming using 
sharp augmented Lagrangian function [6, 19 . This problem has the following form: 

min {H(x, u) := p(x) + u T q(x) + c||g(x)|| 1 | x € 0} , 
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where p is the objective function, q is the function of the equality constraints of the original problem, 
c > is a penalty parameter. If we define g(x) := p(x) + u T q(x), (/>(■) := c|| • || and F(x) :— q(x) 
then this problem becomes a particular case of (|P|) . 

Let us look at the literature on theory and methods related to problem JPj . They can be roughly 
classified in two frameworks. The first class is the problem of minimizing the sum of two objective 
functions and the second one is minimizing a composite objective function. These two classifications, 
of course, can be theoretically combined in a unified framework as we will see later. However, it is 
more convenient to exploit the special structure of the problem if we consider it in the form of ((FJ) . 
The minimization problems of the sum of two objective functions as well as the minimization of a 
composite objective function have been investigated early in many research papers. For instance, 
Mine et al 9J considered the problem of minimizing the sum of two objective functions, where the 
first function is assumed to be smooth and the other one is assumed to be simple. Fukushima and 
Mine [3] then considered the problem of minimizing composite objective function, where the outer 
function is assumed to be nondifferentiable. A popular case of minimizing a composite objective 
function is least squares problems, where the outer function is taken of the form || • \\\. This problem 
class is then extended to the generalized outer function that is assumed to be convex (see, e.g., 
[21 [71 [TTJ [22] ) . Alternatively, there are myriad of research papers consider the minimization problem 
of the sum of two objective function (see, e.g., [121 HH1 [23 [23])- The methods for solving this 
problem have been quite extensively studied. For instance, DC (difference of two convex functions) 
decomposition, splitting backward-forward methods are the methods for solving some sub-classes of 
this problem. 

In our framework, the algorithm for solving problem fFJ) on the one hand can be considered 
as an extension of the gradient schemes that were considered in [llj for solving nonlinear systems 
under nonsmooth least squares problems and then [12] for minimizing composite of two objective 
functions. On the other hand, it can be regarded as a restrict variant of the proximal point algorithm 
framework in 7 . Here, if we define c(x) := (f(x),F T (x),x T ) T and h(u,v,x) :— u + (f>(v) + Sn(x) 
that is convex, where S^i(x) is the indicator function of the convex set 11, i.e. 



Sn(x) := 

then problem (jPj) can be reformulated as 



if a; €0, 

+oo otherwise, 



min h(c(x)). (1-3) 



In [Jj, the authors provided a generic framework so called proximal point method for solving (|1.3I) . 
The proposed algorithm can be considered as a generalized of the classical proximal point methods 
introduced by Martinet [5], The theory in this paper is quite general and cover many classes of 
problems in optimization. This formulation was earlier considered by Burke and Ferris in [2] . 

In this paper, motivated from |17j . a generic algorithm framework so called sequential convex 
programming (SCP) method for solving nonlinear programming, we continue extending the idea of 
Nesterov in [11] and [12] to problem ((Pj. The main idea of SCP method is to keep the convex 
substructure of the original problem as much as possible and to convexify the nonconvex part by 
exploiting the state-of-the-art of the theory and methods in convex optimization. Let us consider a 
nonlinear programming problem of the form: 

minp(x) 

(1.4) 

s.t. q(x) — 0, x £ f2, 

where the objective function p : R™ — > R and f2 C R n are assumed to be convex, g : R" — » R m is 
assumed to be twice continuously differentiable. The SCP methods generates an iterative sequence 
{a; fc }fc>o starting from x° £ fl, and computing x k+1 by solving the convex subproblem: 

minp(x k + d) 

X (1 5) 

s.t. q(x k )+q'{x k )d = 0, x k +d£n, 
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where q'(x k ) is the Jacobian matrix of q at x k , to get a solution d k and setting x k+1 := x k + akd k , 
where £ (0, 1] is a given step size. 

Note that the convex subproblem (|1.5[) may in general have no solution because of the linearized 
inconsistency. The SCP algorithm may be failed in practice. A popular strategy to handle the 
linearized inconsistency is to relax the subproblem (11.5[) by introducing slack variables. For instance, 
this problem can be relaxed as follows: 

m 

mmp(x k + d) + cy^(U + sA 

X ^ — ' 

1=0 

s.t. q{x k ) + q'{x k )d = t-s, x k + d £ Q, (1.6) 
t,s > 0, 

where c > is a penalty parameter. The relaxed problem (|1.6p can be reformulated equivalently to 

minp(x k +d)+ c\\q(x k ) + q'(x k )d\\ 1 

X 

s.t. x k + d £ £1,. 

However, if the objective function p is not strongly convex then the search direction d k may not be 
a descent direction. Therefore, a regularization term | ||d|| 2 should be added to ensure that d k is a 
descent direction. The subproblem (| X . T[) now becomes: 

mmp{x k +d) + c\\q(x k ) + gVMIi + ^Nl' 
x 2 

s.t. x k + d £&,,. 



This problem collapses to the form P2(x) below. 

It has been proved in [TT] that under mild conditions, the SCP method converges to a stationary 
point of the original problem (|1.4[) locally in linear rate. The global convergence behaviour of the 
SCP method has been left unconsidered yet, however. 

The aim of this paper is to consider the theoretical aspects of global behaviour of the proximal 
point methods for solving problem (|P]) as a bridge connected to the SCP method. The results in this 
paper is preliminary and should be further considered for the practice purpose. We first propose 
a generic algorithm scheme which is based on two different subproblems. We prove some technical 
results and the convergence of the algorithmic scheme. For the unconstrained case, we are able to 
provide a worst case global complexity bound under certain conditions. When g is strongly convex, 
the algorithm is accelerated to get a faster global convergence rate that is usually used in gradient 
methods for convex optimization [101 113) . 

Throughout the paper, we require the following assumption. 

A.l. The proper, lower semicontinuous and convex function <p is Lipschitz continuous on the range 
space of F'(x), i?^ := rangeF'(a;), for all x £ R" with a global Lipschitz constant > 0, i.e. 

\4>{u) -4>(v) | <J^||u-«||, Vu,wei? . (1.7) 

An example of the function (f> is 4>(u) = \\u\\, where || • || can be taken any norm. For instance, it 
can be: 

i. The Zi-norm that often appears in the penalty methods, (f>(-) = \\ ■ ||i; 

ii. The Euclidean norm frequently used in the Gauss-Newton and the regularization methods, 

0(0 = 11 -h; 

iii. The Hubber-norm that is defined by 4>(u) = Y^iLi a { u i)i where cr(t) is defined as 

, , fit 2 i£\t\<T 
I Tt otherwise, 

where T > is given (see, e.g., [1]). 
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Generally, the || • || in the definition of <f> can be any norm, which gives us a freedom choice. Thus, 
in practice, we can choose the norm || ■ || such that the Lipschitz constant as small as possible. 

For simplicity of discussion, the Euclidean norm is assumed to be used throughout this paper. 
We denote by V<? is the gradient vector of a scalar function from R™ to R, F' is the Jacobian matrix 
of a vector function F from R n to R m . For a convex function / : C — > RU {+00}, where C is 
convex set in R™, df(x) denotes the subdiffcrcntial of / at x. Each element £ £ df(x) is called a 
subgradient of / at x. The function / is said to be strongly convex with a parameter r > on C if 
/(■) — §11 ' II i s convex on C (see, e.g., [T]). For a given set X C R, int(X) denotes the set of interior 
point of X. 

Since problem fP]) is nonconvex, a local minimizer (if exist) may not be a global one. A point 
x* e SI is said to be a stationary (critical) point to problem ([P]) if 

G dg(x*) + F'{x*) T dcj ) {F{x*)) + N n (x*), (1.8) 

where dg(x*) is the subdifferential of g if g is proper, lower semicontinuous and convex, and Vg(x*) 1 
the gradient vector of g if g is differentiable; F'(x*) is the Jacobian matrix of F at x* and F'(x*) T 
is its conjugate operator; d(j>(F) is the subdifferential of <f> at F := F(x*); and Nq(x*) is the normal 
cone of the convex set SI at a;*, i.e.: 



N n (x*) r- 



{w £ R" I w T (y-x*) > 0, y € Si} , if x* G SI, 
0, otherwise. 



Here, we implicitly use the chain rule, which is assumed to be satisfied in our problem setting. 
The condition (|1.8[) is referred as a necessary optimality condition for (|P|). This condition can also 
expressed as follows: 

d<f>(F) n {v I - F'(x*) T v G d ff (x*) + JV n (a:*)} ^ 0, 

where F = F(x*). Let us denote by S* is the set of critical points of (|P]) and S* is assumed to be 
nonempty. 

For a given a; € SI, we consider the following subproblem: 

min [g(x) + V 'g(x) 7 'd + <f>{F{x) + F'(x)d) + ^\\d\\ 2 \ x + d G Si} , (Pi (a)) 

where p > is a regularization parameter. Since SI is nonempty, closed and convex, and <fi is proper, 
lower semicontinuous and convex, this problem has a unique solution. 



Alternatively, when g is proper, lower semicontinuous and convex, subproblem Pi (a;) is slightly 
changed to: 

min {g(x + d) + <p(F(x) + F'{x)d) + ^\\d\\ 2 | x + d G Si} . (P 2 (a;)) 

This problem is also strongly convex, which has unique solution. 

A generic algorithm framework for solving problem fP]) is briefly described as: 

1. Initialization: Choose an initial point a; G SI and a parameter po > 0. 

2. Main iteration: For each k = 0,1,.... Solve the strongly convex subproblem Pi(x fc ) (or 
P2(^ fc )) to get a unique solution d k . If the stopping criterion is not satisfied then set x k+1 := 
x k + akd k for a given step size ctk G (0, 1], update pk (if necessary) and repeat. 



Particularly, if cj> is identical to zero, subproblem Pi(x) can be considered as a subproblem of 

is 



classical gradient methods |10j (with regularization term). Alternatively, the subproblem P2(x 



a subproblem in framework of classical proximal point methods [SI HE| ■ Note that the subproblem 



Pi(.t) is also closely related to the Levenberg-Marquardt algorithm for solving least squares problems 
or trust-region methods when the /2-norm is chosen. 

The rest of the paper is organized as follows. The next section presents a gradient mapping 
concept and proves the technical results, which will be used in the sequel. Section [3] describes a 
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proximal-point scheme for solving fP]) and proves its convergence. Section @] considers the uncon- 
strained cases and provides a global complexity estimate for the previous algorithm. The last section 
presents a special case of problem ([P]). where the function g is strongly convex with a "sufficiently 
large" parameter. In this case, an accelerating proximal-type scheme is applied to this problem. The 
global convergence is investigated and the complexity bound is estimated. 

2 Gradient mapping and its properties 

Let us first recall some definitions related to the theory in this paper [TJ [TUJ [H]. As before, for a 
given convex set f2, the normal cone of fl at x is defined by 

N n (x)-={{ WeRn 1 wT (y- x )^°> ^ G °}' [ixen (21) 
10, otherwise. 

The set of feasible directions to fi at x is given by 

F a {x) := {d e R" | d = t(y -x), y e Ci, t > 0} , (2.2) 

Let us define 

Df(x*)[d] := Vg(x*) T d + Z(F(x*)f F(x*)d, (2.3) 

with £(&*) G dcf>(F(x*)) the subgradient matrix of <j> at F(x*). Recall that the necessary optimality 
condition of problem (|P]) is 

e Vg(x*) + F'(x*) T d<f>(F(x*)) + N n (x*). 
Then this condition can be expressed equivalently to 

Df(x*)[d] > 0, VdeF n (x*). (2.4) 
Now, let us define the following mapping 

V>(y; x) := f(x) + Vf(x) T (y - x) + 0(F(a;) + F'(x)(y - x)). (2.5) 



Then the convex subproblem Pi(cc) can be rewritten as 



f p (x) :=mm^(y,x) + ^\\y-x\\ 2 \ y G q} . (2.6) 

Since this problem is uniquely solvable, f p (x) is well-defined (finite). Let us denote by V p (x) the 
global solution of this problem, i.e.: 

V p (x) =Argmin{ib(y;x) + ^\\y-x\\ 2 \ y G fj| . (2.7) 

From these definitions, we have f p (x) = ip(V p (x); x) + £\\x — V p (a;)|| 2 . The necessary and sufficient 
optimality condition for subproblem (|2.6[) becomes 



[Vf(x)+p(V p (x)-x)+F'(x) T ax)] T (y-V p (x)) >0, Vyetl, (2.8) 
where £(x) € d(p(F(x) + F'(x)(V p (x) — x)). We define a new mapping G p as 

G p (x) :=p(x-V p (x)). (2.9) 
Then G p is referred as a gradient mapping of problem ([276j) (see, e.g., |10l 112] ). 



Alternatively, for the subproblem P2(x) we define 

^(y; x) := f(y) + cp{F{x) + F'(x)(y - x)), (2.10) 
V p (x) := Argmin{i/i(y;x) + ^||?/-a;|| 2 \ y e ft} , (2.11) 



and f p (x) :=^(V p (x);x) + ^\\V p (x)-x\\ 2 . (2.12) 
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The optimality condition for problem (|2.11[) is 

Vf(V p (x))+p(V p (x)-x)+F'(x) T ^(x) (y-V p (x))>0, Vy e SI, (2.13) 



and the gradient mapping G p associated with P2(x) is defined as 



G p (x):=p{x-V p (x)). (2.14) 

Let us denote by d p (x) :—V p (x) — x, d p (x) :— V p (x) — x, r p (x) := \\d p (x)\\ andf p (x) := ||a! p (x)||. The 
mapping d p (resp., d p ) can be considered as a search direction of the proximal algorithm scheme. 
We have the following conclusions. 

Lemma 2.1. IfV p (x) = x (resp., V p (x) = x) then x is a stationary point of (|Pj) . 

Proof. Substituting V p {x) = x (resp., V p (x) = x) into (12.81) (resp., f|2.13[) ^l . we again obtain ([2]). □ 

Lemma 2.2. The norm of gradient mapping ||G p (a;)||» is nondecreasing in p, and the norm r p (x) 
of the search direction d p (x) is nonincreasing in p. If g is convex then these conclusions also hold 
for || Gp(x) ||* andf p {x), respectively. Moreover, 

f(x)-f p (x)>^r 2 p (x). (2.15) 

Proof. It is sufficient to prove the first part. The second part can be proved similarly. Since the 
function k(t,y) := ip(y;x) + ^\\y — x\\ 2 is convex in two variables y and t. We have 77(f) := 
min y6 o k(t, y) is still convex. It is easy to show that r]'(t) = — Vi/ t (x) — x\\ 2 = — 2^\\di/t(x)\\ 2 = 
^||Gi/ t (cc)|| 2 . Since 77(f) is convex, r/(f) is nondecreasing in f. This implies that ||Gi/ t (a;)|| is 
nonincreasing in f. Thus ||G p (cc)|| is nondecreasing in p and ||d p (x)|| is nonincreasing in p. 
To prove the last inequality (|2.15l) , it is implies from the convexity of 77 that 

f(x) = 77(0) > 77(f) + S (f)(0 - f) - 77(f) + ~r? /t (aO, (2-16) 

where s(f) £ dr/(t). On the other hand, f p (x) = r](l/p). Substituting this relation into (|2.16p . we 
obtain (|2.15[) . The lemma is proved. □ 



This lemma gives us an observation that if we increase the parameter p in the subproblem P\(x) 
(resp., Pi (2;) ) then we will obtain a short search direction d p (x) (resp., d p {x)). Therefore, a suitable 



choice of the parameter p in practice is necessary. 

In the sequel, we introduce the following assumptions. 

A. 2. The function F is Lipschitz continuously differentiable on R n with a Lipschitz constants Lp, 
i.e. 

\\F'{x)-F'(y)\\<L F \\y-x\\, Vx,yeF. (2.17) 

A. 3. The function g is Lipschitz continuously differentiable with a Lipschitz constant L g > on 
dom(<7), i.e. 

||Vs(a:)-V5(v)||<L fl ||tf-x||, Vx,y e domg. (2.18) 

A. 3'. The function g is proper, lower semicontinuous and convex on its domain dom(g). 

Under the conditions (|2.17l) and (|2.18D . applying the mean- valued theorem [T3] we can easily 
prove that: 

\\F(y) - F(x) - F\x){y - x)\\ < ^f\\y - xf, (2.19) 
\g(y)~g(x)-Vg(x) T (y-x)\ < ^-\\y-x\\ 2 . (2.20) 
The following lemma show an upper estimation for the objective function / of (|P|). 



6 



Lemma 2.3. Suppose that Assumvtions VA~l\ and [A~B are satisfied. If, in addition, Assumption \A . 
holds then 

|/(2/) - if)(y; x)\ <\{L g + L^L F )\\y - x\\ 2 . (2.21) 
Alternatively, if, in addition, Assumvtion \A.3] is satisfied then 

f(y)-i>(y,x)\ < \L^L F \\y~x\\ 2 . (2.22) 

Proof. Using the Lipschitz continuity of <f> and estimations (|2.20p and (|2.19[) , we have 

\f{y)-ip(v,x)\ < \g(y) -g{x) - Vg(x) T (y- x)\ 

+ \<l>{F{y))- ( j>{F{x) + F'{x)(y-x))\ 

< {Lg+ ^ Lf) \\y-*\\ 2 , 

which proves (|2.21[) . Similarly, we have 

f(y)-TP(y,x)\ < \<f>(F(y)) - <p(F(x) + F'(x)(y - x))\ 

< -^Hltz-sll , 

which proves (f2T22|) . □ 
From these estimations, it follows that 



f(y) < m p (y;x) := ijj(y,x) + -\\y - x\\ 2 , 
(resp., f(y) < rh p (y;x) := 4>{y;x) + -\\y - x\\ 2 ), Vy G fi, 



(2.23) 



provided that p > Lg + L^Lp (resp., p > L^Lp). The algorithm scheme is then designed to generate 
a sequence {x k } c ft starting from x° G ft and decreases the model m p {y;x) (resp., fh p (y;x)). 
The following lemma provides some useful properties that will be used in the sequel. 

Lemma 2.4. Under A ssumvtions VA~7\A . 2\ If A ssumvtion VA . 31 holds then, for any x £ ft, we have 
f(x) - f{V p {x)) > 2 P-^ + L ^K 2 p (x) = 2p - {L £ 2 L * LF) \\G p (x)\\ 2 . (2.24) 
Df{x)[d p {x)\ < ~ P r p {x) 2 = -i||G p (.T)|| 2 . (2.25) 
If Assumvtion VA.3\ holds then, for any x £ft, we have 

f{x) - f(V p (x)) > 2p -^ LF r p (x) 2 = 2p - 2 L p f F \\G p (x)\\ 2 . (2.26) 
Df(x)[d p (x)} < -pf(x) 2 = -l\\G p (x)\\ 2 . (2.27) 

Proof. Let use denote by V := V p (x) and V = V p (x), from the estimation (|2.20p . we have 

f(V) = g(V) + 4>{F{V)) < g(x) + \7g(x) T (V - x) + ^\\V - x\\ 2 + 4>{F{V)). (2.28) 



On the other hand, using the optimality condition (|2.8[) for y = x € fl, it implies 

Vg(x) T (V - x) < -p\\V- x\\ 2 + £(x) T F{x){x - V{x)). (2.29) 
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Since <j) is convex, we 

et>(F(x)) - (j){F{x) + F'(x)(V - x)) > £(x)F(x)(x ~ V), (2.30) 
where £(x) £ dc/>(F(x) + F'(x)(V - x)). Combining (j2~25)) . (12"^)) and (|2~30)l . we obtain 

/(V) < <?(*) + 4>{F(x)) - 2 -P^L\\V- x\\ 2 + 4>{F{V)) - 4>{F(x) + F'(x)(V - x)). (2.31) 

Using the L^-Lipschitz continuity of <fr and estimation (|2.19p . we get 

4>{F{V)) — 4>{F(x) + F'(x)(V — x)) <L4F(V)-F(x)-F'(x)(V-x)\\ 

j j (2.32) 

<^f\\V-x\\ 2 . 

Plugging this inequality into (|2.30j) . and noting that p 2 \\V - x\\ 2 = ||G p (a;)|| 2 , we get (|2.24l) . 
Now, we prove the second inequality. From the optimality condition (|2.4p of (|P]), we have 

Df(x)[V -x]= [Vg(x) + F'(x) T ax)] T (V - x), (2.33) 
where £(x) g d<fi(F(x)). Using (I2.8[) with y = x, it implies 

[^'(x) T e»] T (^ - x) < [F'(xf(Z(x) ~ C»)] T (U - x) -p\\V- x\\ 2 . (2.34) 
By the convexity of </> we have 

[F'(x) T (Z(x) - £(x))] T (V -x) = -H{x) - ^)) T F'(x)(V -x)<0. (2.35) 
Combining QQ5]) , (|2~33|) and f2"35|) . we get 

D/(aO[V - x) < -pV -x 2 = -i||G p (x)|| 2 , 

which proves (I2.25|) . 

If g is convex then we have 

f(V) = g(V) + cf>(F(V)) < g(x) + Vg(V) T (V - x) + </>(F(V)). 

Using this inequality and with the same argument as before, we can easily prove the inequalities 
rf2~26l) and ((2~2T)l . □ 

The inequality (|2.25[) (resp., f|2.27[> ^ shows that d p (x) (resp., d p (x)) is a descent search direction 
of problem (|P|) . 

Let us define the level set of the function / restricted to ft as follows: 

C f (a):={y€n\f(y)<a}. (2.36) 

We have the following result. 

Lemma 2.5. Suppose that x € int(Cf(f(x)) C int{J-). Then if p> L g + L p Lp (resp., g is convex 
and p > LpLp) then V p {x) £ Cf(f(x)) (resp., V p (x) £ Cf(f(x))). 

Proof. It is sufficient to prove the first statement. The second one can be done similarly. It is trivial 
that x £ Cf(f(x)). Assume, for contradiction, that V p (x) ^ £/(/(x)). Since x £ int(£ / (/ (x))) , the 
line segment connected x and V p (x) insects the boundary of Cf(f(x)) at x(a) — x + a(V p (x) — x) 
for some a £ (0, 1). From the definition of x{a) and (12 . 15|) . we have 

f(x(a)) > f(x) > f p (x). (2.37) 
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Consider d := F(x(a)) — F(x) — aF'(x)(V — x), by virtue of (|2.19[) with y = x(a), one has 

\\d\\<^fa 2 \\V-x\\ 2 . (2.38) 

Using Assumptioni JA.2llA.3l and the convexity of <fi we have. 

/0(a)) = g(x + a(V - x)) + <j>(F(x) + aF'(x)(V - x) + d) 

< g{x) + aVg{x) T {V - x) + ^^-\\V - x\\ 2 + 4>{F{x) + aF'(x)(V - x)) + L lP \\d\\ 

< (l-a)f{x) + a[ip(V-,x) + -\\V-x\\ } 1 ^ 11^-^11 • 



From (1237)1 . (j2~351) and (|2Tg5|) . we obtain 



[2p — q(£ fl + LpLp)] ||T7 2 



/(*) < /p(^) - — V \ " n W ~ x 

that is contradict to (|2.24D . □ 
Lemma 2.6. Suppose that both x and V p (x) in J- . Then 

f P (x) <mm[f{x) + ^\\y-x\\ 2 \ y G il| , (2.39) 
where p :— p + L g + L^Lp. Moreover, if x* is a solution to fP]) and Cf(f(x)) C J- then 

f P (x)<r + £\\x*-xf. (2.40) 
Alternatively, if both x and V p (x) in J- then 

f p (x) < min + - x|| 2 | y e fi| , (2.41) 

where p :— p + L^Lp, and if x* is a solution to (|P]) and Cf(f(x)) C T then 

f P (x)<r + ^\\x*-xf. (2.42) 

Proof. For any y £ J 7 , we denote by d g {x,y) :— g{y) — g(x) — Vg(x)(y — x) and dp{x,y) := 
F(y) - F(x) - F'(x)(y - x). By ([2T2Uj) and (j2~T§|) . we have ||d ff (a:,y)|| < ^||y-x|| 2 and ||d F (a;,y|| < 
"^"11 V — x\\ 2 . Since both x and Vp(x) in J 7 , using the Lipschitz continuity of <fi, we have 

f p (x) = mm ^(y; x) + ^\\y - x\\ 2 \ y € ft} 

= min + d ff (<c, y) + 0(-F(y) - if)) + - x|| 2 | y e 

< min \g(y) + 0(FQ/)) + 9 + Lg ± L ^ F ||y - z|| 2 | y G f7 



By denoting p := p + L g + L^Lp, we obtain (|2.39[) . The estimation (|2.40p is then proved by 
substituting y = x* into (|2.39[) . The remainder is proved similarly. □ 



3 Algorithm framework and its global convergence 

Let us denote by L := L g + L^Lp) (resp., L := L^Lp). As it is showed in Lemma \2 . 3 1 and [2TB1 that 
if the regular parameter p is chosen by p > L (resp., p > L) then d p (x) (resp., d p {x) is a descent 
direction to problem (|P]). However, if p is to big, the algorithm may generates a short step. Balancing 
between these issues plays an important role in implementation. In the following algorithm, we 
combine the gradient scheme with a simple line-search strategy to determine p adaptively. 
The algorithm is described as follows: 
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Algorithm 1. 



Initialization: Choose x° in il and fix Lq £ (0, L] (resp., Lq G (0, L]. Set fe := 0. 
Iteration k: For a given x k , execute the two steps below: 

Step 1: Find p k G [i ,2i] (resp., p fe e [i ,2i]) such that /(V^ (a: fe )) < /^O^) (resp., 
/(^ fc ))</ Pfe (x fe )). 

Step 2: Update the new iteration x k+1 := V Pk (x k ) (resp., x k+1 := V Pk (x k ). Increase k by 1 
and go back to Step 1 . 



The computational cost of solving the subproblem Pi(x) (resp., P2(x)l mostly depends on the 



structure of the outer convex function <f> and fl (resp., additionally, the structure of the function g). 



If fl = R™ (the unconstrained case) and <p(u) — \\u\\ then the subproblem Pi(x) can be solved by a 
standard linear algebraic procedure (see [11)1. 

To prove the convergence of Algorithm [TJ we require the following assumption. 

A. 4. The set J- is sufficiently large such that Cf(f{x)) C T . 

If this assumption is satisfies then Cf{f{x k )) C J- for all k > due to the nonincreasing mono- 
tonicity of the sequence {f(x k )}k>o- 

Theorem 3.1. Suppose that Assumvtions . 2\ hold. Then for any k > 0: 

a) If Assumvtion [A.3\ is satisfied then 

i—k i—k 

b) If Assumvtion lA.3] is satisfied then 

1 oo j oo 

f(x k )-f* > -L^P^f > yE%(^) 2 - (3-2) 

i—k i—k 

Consequently, one has 

lim \\x k+1 -x k \\ = 0, (3.3) 

k— >oo 

and the set of limit points Q* of the sequence {x k }k>ois connected. If this sequence is bounded (in 
particular, Lf(f(x )) is bounded) then every limit point is a critical point of ([P]). More further, if 
the set of limit point Q* is finite then the sequence {x k } converges to a point x* in S* . 

Proof. From Step 1 of Algorithm [I] we have f(x l+1 ) < f Pi (x l ). Combining this relation and (|2.15p . 
and note that r Pi (x l ) is nonincreasing in pi by virtue of Lemma 12.61 we have 

f(x l+1 ) < f P M) < /(^) ~ f r* (*<) < f(x<) - tfrj^ix*) < f(x k ) - tfrl- L {x*). (3.4) 
Summing up the inequality (|3.4p from i — k to i — N > k we get 

N N 

f(x k ) - f(x N +i) > ^oE^(^) > if 5>^)- (3-5) 

i—k i—k 

Note that the sequence {f(x k )}k>o is bounded from below, passing to the limit as N — > oo in (|3.5|) 
we obtain (|3.1I) . The inequalities p.2p are proved similarly. 

Now, we replace k = into (13.11) . it implies that r Pi (a; 1 ) 2 < +oo. Since r Pi (x 1 ) — \\x l — x i+1 \ 
we get lim^oo \\x l — x l+1 \\ = 0, which proves (I3.3|) . 

If the sequence {x k }k>o is bounded, by passing to the limit through a subsequence and combining 
with Lemma |2.1[ we easily prove that every limit point is a critical point. If the set of limit points 
Q* is finite. By applying the result in 15 [Chapt. 28], we obtain the proof of the remaining 
conclusion. □ 
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In the framework of least squares problems, it is often that the number of data points is larger 
than the number of parameters (or variables). In this case, we have m > n. A critical point x* € S* 
of (|Pj is said to be nondegenerate if a* F := a m i n (F '(x*)) > 0, where o~ m i n (F' (x*)) is the smallest 
singular value of matrix F'(x*). We require the following assumption. 

A. 5. The set of nondegenerate critical points x* € S* of (|P]) is nonempty. 

We also denote by a* := er m i n (V 'g(x*) T ) > 0, the smallest singular value of vector \7g(x) T . (This 
notation is convenient for the case n = 1). 

A set is said to be a set of weak sharp minima for the function <f> if there exists a constant 
70 > such that 

4>(u) - <j> min > 70dist(u, S£), Vu € dom0, (3.6) 

where 4> m - nl :— min ue dom0 4>(u) and dist(u, S) is the Euclidean distance from u to a set S. The 
constant 7^ and the set are called the modulus and domain of sharpness for over S 1 ^, respectively 
(seei). 

We have following result. 

Theorem 3.2. Under Assumptions \ A. i§ A. 4\ Suppose that problem satisfies Assumvtion \A. <5l 
wif/i a Zocal solution x* € S 1 * and suppose further that the set of weak sharp minima of <j> is 

nonempty. Then, if x k € Cf(f(x )) and \\x k — x*\\ < ^+bL ^ len xk+1 e £f{f( x °)) an d 

k ™*l|2 



where L:=L g + j^Lp and := a* + jrpo-p. 

Proof. From (|2.40[) and notting that is nonempty, using (|3.6p . we have 

\\\x k - x*f > f Pi {x k ) -f> i>(x k+1 ;x k ) - f* 

= g(x)-g(x*)+Vg(x) T (x k+1 -x k ) 

+4>(F(x k ) + F'(x k )(x k+1 - x k )) - cf>(F(x*)) (3.8) 
>g{x)-g{x*) + Vg{x) T {x k+1 -x k ) 
+l4F{x k ) + F'{x k )(x k+1 - x k ) - F{x*)\\. 

Now, using Assumption I A. 41 and (|2 . 20[) . we estimate 

g{x k )-g(x*) + Vg{x k ) T {x k+1 -x k ) = g(x k ) - g(x*) - Vg(x*)(x k - x*) 

+ [Vg(x k ) - Vg(x*)f{x k+1 - x k ) + Vg(x*)(x k+1 - x*) 

2 



> _ x *f _ Lg \\ x * _ ^im^+i _ x k \\ (3.9) 



-<7* g \\x k+1 - X* 



Q T 

> [a* g - L g \\x k - x*\\]\\x k+l -x*\\- -^\\x k - x*f. 
Similarly, using Assumption IA.41 and (I2.19[) . we have 

F(x k )+VF(x k )(x k+1 -x k )-F(x*) > [a F -L F \\x k -x*\\}\\x k+1 - x*\\-^\\x k - x*\\ 2 . (3.10) 
Plugging ([31]) and ([3~TU]) into we get 

3L + 31^+37^ ^ - *T > H + 7** {L 9 + l,L F )\\x k -x% \\x k ^ -x% 

Since \\x k — x*\\ < ^XjI'l'f t ^ ien ^ e ^ ast mec l uant y implies the first part of (|3.7p . If \\x k — x*\\ < 
„^ a J', +1 , 4 '? F \ then we obtain the second part of (13.71) . □ 
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4 Global convergence rate of the unconstrained case 

In this section, we consider the rate of global convergence of Algorithm [1] based on the subproblem 



Pi(x) for the unconstrained case, i.e. O = R r ' 



For a given x G F, let us define the following matrix mapping from R n — > R i nx ( m + 1 ) : 

M{x) := [F'{xY V ff (,)] nx(m+1) . (4.1) 

The matrix mapping M(x) is said to be nondegenerate at x if cr m i n (M(a;)) > &m > 0, the smallest 
singular value of M{x). Matrix M{x) is said to be nodegenerate on a given set C if it is nondegenerate 
at any x G C . We make the following assumption. 

A. 6. The matrix mapping M(x) is nondegenerate on Cf(f(x )). 

Note that this assumption implies that m < n. In term of nonlinear optimization, this is often 
the case that requires the number of equality constraints is smaller than the number of variables. As- 
sumption \K^E\ is closely related to the linear independent constraint qualification (LICQ) in nonlinear 
programming. 

By using Shur's complement, Assumption IA.61 is equivalent to Vg(x) ^ and A m i n (M(a;)) > 
o\ > 0, where M[x) := F'{x){\\ X7g(x)\\ 2 I n - X7g(x)Vg(x) T )F'(x) T with /„ being the identity 
matrix. 

Theorem 4.1. Suppose that Assumptions VA~1\A.S\ and \A.$A.& are satisfied and x* € S* is a 
critical point of ((Pj • Then 



%) Let the sequence {x k } be generated by Algorithm^ based on the subproblem Pi(x) and satisfied 

2 _ 

Assumvtion \A.3[ If f(x k ) — f(x*) > ~, where L := L g + L^Lp, then 

f(x k+1 ) ~ /(a;*) < f(x k ) - fix*) - °k. (4.2) 



Othe 



f(x k+1 ) f{x*) < ±r[f{x k ) - f(x*)] 2 . (4.3) 



b) Let the sequence {x k } be generated by Algorithm]]^ based on the subproblem P\ix) with pk = L 
and satisfied Assumvtion \A.3\ If fix k ) > ^-f- then 



Otherwise, 



fix k+1 ) - f{x*) < fix k ) fix*) (4.4) 



/(^ +1 ) - fix*) < ~^[fix k ) - fix*)} 2 < \[fix k ) - fix*)} 2 . (4.5) 



Proof. It is sufficient to prove the first part 1. The second part is proved similarly. Suppose that x* 
is a local minimizer of ([P]). Let us consider the linear system 



Vg{x k ) T d+gix)-gix*) + <l>iFix k ))-<l>iFix*))=Q 



By Assumption ! A. 61 applying Lemma 6 in [TT] with noting that gix k )— gix*)+cf>iFix k )) — <piF(x*)) = 
fix k ) — fix*) > , it implies that there exists a solution d* of the linear system (|4.6p such that 

< g(x k )-gjx*) + <l>iF(x k ))-<l>(F(x*) _ fjx k )-fjx*) 
11 11 " a min (M(^)) c min (M(z*)) ' 1 ' ' 
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Now, by the rule at Step 1 of Algorithm [TJ using the convexity of <j> and noting that pf. < 2L (see 
Step 1 of Algorithm [T]) , where L := L g + L^Lp, we have 



f(x k+1 ) < f Pk {x k ) = mm{^x k +d;x k ) + ^\\d\\ 2 } 



< min {g(x k )+tVg(x k ) T d* + MF(x k ) +tF'(x k )d*) + Lt 2 \\d*\\ 2 } (4.8) 
te[o,i] 

< min {(1 - t)f{x k ) +tiP(x k + d*;x k ) + Lt 2 \\d*\\ 2 } . 

Since <f> is Lipschitz continuous, and d* is a solution to (|4.6[) . we have 

^(x k + d*-x k ) = g{x k ) + Vg(x k ) T d* + <j)(F(x k ) + F\x k )d*) 

< Vg{x k ) T d* + g{x k ) - g(x*) + <j>{F{x k )) - cf>(F(x*) 
+fi(F{x k ) + F\x k )d*) - (j){F(x k )) + g(x*) + 4>(F(x*) (4.9) 

< L4F'(x k )d*\\ + g(x*) + 0(F(x*) 
= fix*). 

Combining P~7) . <g75J) and (1431) . we obtain 



L 

te[o,i] |/ " ' ' °m 



f(x k+1 ) - f(x*) < mm \ (1 - t)[f(x k ) - f(x*] + —t 2 [f(x k ) - f(x*)} 2 \ . (4.10) 



Thus if f(x k ) — fix*) > then the right hand side of (I4.10P attains the minimum at t* 

_2 

and therefore, we have 



f{x k+1 )~ fix*) </(£*)- /(z*) - ^ 



i 

4L " 

Otherwise, it attains the minimum at t* — 1 and we get 

/(x fc+1 )-/(x*)<4"[/(^)-/(^)] 2 - 

The theorem is proved. □ 
Let us define 

Dix Q ):=mm{\\x° -x*\\, x* £ S*} , 

the distance from the initial point x° to the set of stationary points S* . From Lemma \2M Algorithm 
Q]can guarantee that fix 1 ) — fix*) < §ZZ?(x ) 2 , where L := L g + L^Lp. Now, using Theorem 14. 1[ 
it is easy to see that 

N < 1 + ^ifix 1 ) - fix N+1 )) < 1 + ^[fix 1 ) - fix*} < 1 + ^D 2 (x°). 

Thus the number of iterations for Algorithm [T] starting from x° to enter into the quadratic conver- 
gence region is A min := 1 + 6 



(,L g +L <t> L F )D(x )] 2 



5 Accelerated scheme for the strongly convex case. 

When the first term g(x) of the objective function fix) is strongly convex with a parameter r g > 
L^Lp > 0, we are able to accelerate Algorithm Q] by using the same trick as in gradient schemes (see 
Q2J1CE2]) to solve problem flF}. Typically, we require the following assumption. 

A. 7. The function g is strongly convex with a parameter r g such that T g > L^Lp. 
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We consider two sequences {a,k}k>o and {(fk}k>o generated recursively as follows: 



ao := 0, dfc+i := a k + a kl 
V>o{x) := -\\x - a; 1| 2 , 



(5.1) 



(p k +i{x) := tpk(x) + a k 



f(V Pk (y k )) + - ( \\G Ph {y*)f ~ LG Pk (y k y (x y k ) 

Pk 



where the sequences {ak}k>o C (0, +oo) and {y }k>o are given, V Pk (y ) is the solution of ~P-2,(x) 
with x = y k and p = p k , G Pk {y k ) := Pk(V Pk {y k ) - y k ) and L := p k + L^Lp. 

By the construction of {ak}k>o and {<fik}k>o, it is possible to maintain the following rules for all 
k > 0: 



dkf(x ) < (fil ■= min^ fc (x), 
(fk(x) < a k f(x) + - .x°|| 2 . 



Note that if these rules are maintained then we have 

f(x k )-f(x*)< llx °; x * ll \ k>i. 

2ak 

Thus by a suitable choice of ak, we can accelerate Algorithm Q] for this special case. 



Lemma 5.1. Under Assumption \A.1\ IfV p (x) is the unique solution to P2\X) then 

f{z)-f{V p (x)) > +- [\\G p (x)\\ 2 - LG p (x) T (z - x) 
for all z £ CI, where L := L^Lp + p. 



(5.2) 



Proof. For simplicity of notation, we denote by V := V p {x). Since 4> is X^-Lipschitz continuous and 
convex, using (|2.19l) . for any zg!!, we have 

cj>{F{z)) - <t>{F{V)) = 4>{F{z)) - </>(F(x) + F'(x)(z - x)) 

+<f>{F{x) + F'{x){z - x)) - <p{F{x) + F'[x){V - x)) 
+(j){F{x) + F'(x)(V - x)) - 4>{F{V)) 



> 



\\z - x\\ 2 + \\V - x\\ 2 + (F'(x) T i(x)) T (z - V), 



where £(x) G d<p(F(x) + F'(x)(V - x)). Therefore, 

f(z) - f{V) = g{z) + <j>{F{z)) - g(V) ~ <t>(F(V)) 

>Vg(Vf( Z -V) + Il\\ Z -V\\ 2 



(5.3) 



La,L 



\\z x\\ 2 + \\V x\\ 2 + (F'(xfi(x)) T (z - V). 



Using the optimality condition for ^2{x) we have 

Vg(V) T (z -V) + {F'{x) T l{x)) T {z -V)> p(V - x) T {V - z), Vz G 0. 
Substituting this inequality into (|5.3p . we obtain 

f(z) ~ f(V) >^\\z-V\\ 2 - [\\ z -x\\ 2 + \\V- xf] + p(V - xf{V - z). 

Since r g > L^Lp by Assumption IA.7I the last inequality implies that 

f{z) - f{V) > -L^L F {V - x) T (z -x)- p(V - x) T {z -x) + p(V - xf (V - x). 
Substituting G p (x) = p(V - x) into (j5T4)l . we obtain (j5~2j) . 



(5.4) 
□ 
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Corollary 5.1. Under Assumption \A. r /\ Suppose that the sequence of mappings {<Pk}k>o defined by 



5.1[) . Then this sequence maintains the rule (Rf, 



Proof. We prove by induction. For k = 0, it is easy to check that the rule (Rj: I is true. Assume that 
this rule holds for some k > 0. We prove it is true for k+1. Indeed, from the definition (|5.ip of (p k 
and using Lemma 15 . 1 1 with x = y k , we have 



(fk+i(x) =ip k (x)+a k 
< a k f{x) + a k 



1 



f(V Pk (y k )) + - \\G Pk (y"W ~ LG n (y*Y (z y k ) 

Pk 



f{V Pk (y«)) + - \\G Pk (y"W - LG Pk(y ) (z - y fc ) 

Pk 



+ -\\x-x» 



<a k+1 f(x) + -\\x-x°\\ 2 



-OLk 



Pk 



f(V Pk (y k )) + - \\G Pk (y k )\\ 2 - LG Pk (y k y (z - y k ) - f{x) 



1 



< a fc+1 /(a;) + |||^ - ^°|| 2 - 



is maintained. 



□ 



This inequality shows that the rule (R 

Suppose that v k is the unique solution of the minimization of the function <p k on fl, i.e.: 

v k := &rgmm{(p k (x) \ x G f2} , (5.5) 

and <fl := ip k {v k ). We now generate three sequences {rk} k>0 , {y k } k>0 and { xk } k>0 °y the scheme 
below: 



r k :=— 5?— 6(0,1), 

/ := (1 - T fc )x fe +r fc u fe , 
.T fc+1 :=^(j/ fc ). 



(5.6) 



The following lemma shows that the rule (R\ I holds for the sequence {x k }k>o defined by ([5 

Lemma 5.2. Under Assumption \A .j\ Suppose that the sequences {ip k } k >o and {x k }k>o defined by 
(|5.1[) and (15.61) . respectively. Then 



¥>* k+ i > a k +if(x ' ) + —\a k+1 pk 
Pk \ 



k\\\1 



\G Pk {y k )\ 



(5.7) 



where L := L^Lp + p k and G Pk (y k ) = p k (x k+1 -y k ). Moreover, ifQ<a k <\(q k + \/q[+ 4q fe a fc ), 
where qk := -£jt) then the rule (R\ ) is maintained. 



Proof. We again prove this lemma by indu ction. For k = the rule [R\) is true. Assume that it 
holds for some k > 0, we now prove {R\ \ holds for k+1. For simplicity of notation, we denote 
by G k := G Pk (y k ). Note that ip k is strongly convex with parameter t v = 1, by the assumption of 
induction, we have 

<Ph{z) >f* k + \\\z-v k f> a k f(x k ) + - v*\\ 2 , Vz G fi. 
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Therefore, using this inequality and Lemma |5. II for z = x k , we have 

f(x k +') + -(\\G k \\ 2 -L(G k ) T {z-y k )) 
Pk x - 



Vk+i 



min <^ ifk(z) + a k 



> mini a k f(x k ) + -\\z - v k \\ 2 + a k 



f(x k+1 ) + -(\\G k \\ 2 -L(G k f(z-y k )) 
Pk V ' 



> a k f(x k+1 ) - ^ 



mm < — 2 — v 
zen 1 2 



Pk 

fc||2 a k rtr<k\T 



\\G k r - L(G k Y (z - y k ) + a fe /(:r fc+i ) + ^||G fe | 



„fc+i\ , a fc iirVfci|2 



-L(G fc y j '(z-/') 



= a k+1 f(x k +i) + ^\\G k f + ^L(G k f(y k - x k ) 
P Pk 

+ min{ 1 -\\z-v k r-^L(G k f(z-y k ) 
zen [2 p k 

Let us denote the minimization term in the last line of (|5.8[) by M k , then we have 
M k 



> mm{\\\z-v k \\ 2 -^L{G k f{z-y k )\ 
zSR™ ^ 2 p k ) 

= -^L 2 \\G k \\ 2 + ^L(G k f(y k - v k ). 
2Pfe Pk 



<P%+1 > a k+1 f{x k+1 ) + la k+1 p k - 



\G 



fc||2 



(5.8) 



(5.9) 



Since a k (y k — x k ) + a k (y k — v k ) = by definition (|5.6[) of y k , plugging this relation and (|5.9[) into 
(|5.8p we obtain 



The inequality (|5.7p is proved. 

Moreover, we note that < a k < \(q k + y/q k + ^Qk^k) then a k+ ip k ^ > 0. Hence, 

fk+i > a k+if(x k+1 ), i.e. the rule (R k ) holds by induction. □ 

According to Lemma 15.21 the sequence {a k } k > has to be chosen such that < a k < 

ik+\J^q k +q k ^ simplicity of discussion, in the following algorithm, we choose a k :— 4 ^l F ■ 

The sequence {p k } k >o is fixed at p k — L^Lp for all k > 0. 

The accelerated variant of Algorithm [1] for solving problem ([P]) that satisfies Assumption IA.7I is 
presented as follows. 



Algorithm 2. 



Initialization: Choose x° in f2 and fix a parameter p k := L^Lp{;= L) for all k > 0. Set do := 0, 

ipo{z) := \\\z — x°|| 2 , and k := 0. 

Iteration k: For a given x k , execute the four steps below: 
Step 1 : Compute v k by solving 

v k := argmin{(p jt (z) | z G fl} . (5.10) 
Step 2: Compute y k := j^x 1 " + -^v k . 

Step 3: Solve the convex subproblem P2(^) with x = y k and p := p k = L to obtain a unique 
solution x k+1 := V~ L (y k ). 
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Step 4 : Update tpk+i(x) by 

-u 1^ r . . . 1. . . . ._ .1 

(5.11) 



r s , ^ ( k + 1 ) 



AL 

Increase k by 1 and go back to Step 1 



fix* 1 ) + j\\G L (y k )\\ 2 2G L (y k ) T (x - y k ) 



At the Step 4 of Algorithm [2j to update the function tp^, the Lipschitz constants and Lp 
are required. Otherwise, a line-search strategy should be used to estimate these constants. Problem 
(|5.10p at Step 1 is a minimization of a quadratic function on a convex set. The computational cost 
of solving this problem depends on the complexity of Q. 

The following theorem proves the convergence of Algorithm [5] and shows that the global com- 
plexity bound is o{ l * Lf ^~ x *^ ). 

Theorem 5.1. If the sequence {x k }k>o generated by Algorithm^ for solving problem (|Pj satisfies 
Assumption \A. 7| then, for k > 1, we have 

/(*v/(*-) ^*;-*-' 12 . <5. 12 > 

where x* is a stationary point to (|P|) . 

Proof. From the formula of computing Tk at Step 1 of Algorithm [5J it implies c*k — and, as a 



consequence, ak = Yl!j=o a j — ^fj~^ • Moreover, we have 



4L 

. i Mnrenvpr we have 

8L 



1 af2 (fc + l)(fc + 2) (fc + l)% of , 2 (fc + lKfc + 2) (fc + 1) 2 

Ok^Pk Ctl.lv = ; 2_L = > U. 

+ P 2 fe 8 32L 2 4 4 

Therefore, the sequence {x k }k>o generated by Algorithm [5] satisfies the assumptions of Corollary 



5.11 and Lemma [521 Thus the rules ) and (R\ ) are maintained. Using these rules, we deduce 

a k f{x k )< V l<a k f{x*) + \\\x°-x*\\ 2 . 

Consequently, we obtain 

a k k(k + 1) 

The theorem is proved. □ 
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